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It is suggested here thai in many cnvi- 
ronincntal and olher contexts the sever- 
ity of an extreme event might usefully 
be represented by the sum of the ex- 
cesses of a measured variable over a 
high threshold. Tl)e general form of the 
limiting distribuliiins of sueh sums for a 
wide class uf models has been derived 
by Anderson and Dancy, and has sug- 
gested methods for the sliitislical analy- 
sis of data concerning extreme severity. 
This wdrk is reviewed here, and some 
extensions to the distribiitiunal theory 



are presented. An application of the 
methods to atmospheric ozone levels, 
which calls for the extension of the ap- 
proach to take account of covariate in- 
formation is reported. 
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1. Introduction 

The severity of a storm or a flood is often a func- 
tion not only of the peak value of whichever envi- 
ronmental variable is concerned, but also of other 
aspects of the extreme event, such as its duration 
and temporal shape. An extended run of days with 
temperatures just below freezing, for example, can 
be more disruptive to everyday human activity and 
to animal and plant life than a single day with a 
much sharper frost. Similarly, sustained moderately 
high water levels in a river or the sea can lead to 
greater flooding than a more extreme level lasting 
for only a short time. To attempt to analyze such 
examples in a way which captures the notion of 
severity implicit in them demands an exten.sion of 
traditional statistical methods for extremes, which 
have tended to concentrate largely on the mod- 
elHng of maxima or storm peaks. In Ref. [1] it was 
suggested that for an important class of applica- 
tions a simple way to quantify the idea of severity 
is in terms of the sum of the excesses of the envi- 
ronmental variable over a high threshold during 
the extreme event. In the case of a flood, for exam- 



ple, this sum or aggregate excess is a discrete ap- 
proximation to the total volume of water 
overtopping the threshold, and in the case of tem- 
peratures the analogous quantity defined for low 
values, the aggregate deficit, is a measure of expo- 
sure or cumulative damage. In the earlier paper 
some distribution theory was developed for aggre- 
gate excesses, and an application to flood data was 
discussed. Here I review that work and present 
some extensions of its distributional results, and 
discuss a new application to ozone concentrations. 



2. Preliminaries 

The techniques to be described are related to 
threshold methods for extremes [2], and the distri- 
butional results are formulated in terms of the 
Mori-Hsing point process representation [3, 4] for 
the structure of high values of a stationary 
sequence. We brietly recall ideas from these two 
areas. 
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Suppose {Xj} denotes a sequence of observations, 
and let u be a high threshold. Times ; at which 
Xj>u are referred to as exceedances of u by {A^}, 
and the sizes of overshoots A"; - u at exceedances 
are called excesses over the level u. In environmen- 
tal applications exceedances are often found to oc- 
cur in clusters corresponding to physical storms. 
Threshold methods are based on the modelling of 
the peak excess within each cluster by a generalized 
Pareto distribution, with distribution function of 
the form 



G(r:^<.) = l-(l+-^) 



-Mi 



(I) 



where o- > is a scale parameter, f(— oo<f<oo)is 
a shape parameter, and the range of j is such that 

Let N denote the number of exceedances within 
a cluster, and suppose that £i > . . . > fv are the cor- 
responding excesses. Then the suggestion above is 
that the aggregate excess within a cluster 

N 

1 

is for some purposes a reasonable measure of the 
severity of a storm event. For statistical modelling 
we are interested in the distribution of S, particu- 
larly for high thresholds u. Since S > iu we expect 
S to have (in the limit as u increases) a tail no 
lighter than that of the limiting generalized Pareto 
distribution of fj. The distribution of S is also ex- 
pected to reflect the cluster size and the pattern of 
dependence between individual excesses Ci- 

Suppose now that Af„ =maxis,SH Xi. It is known 
that for many {Xi} sequences M, may be normalized 
to converge in distribution to some nondegenerate 
limit. Suppose in fact that there is a continuous and 
strictly decreasing function u„(t) such that, for each 

T>0, 



on [1, 00 ) with a random number/f, of points. More- 
over the processes {y;^ :j = 1 /C,} for each i are 

independent of each other and of the {(5(,7/)} pro- 
cess, and are identically distributed. 

A natural interpretation of this convergence re- 
sult is that large values of the {Xi} sequence occur 
in clusters, located in time at the points of a simple 
Poisson process, and that values within a cluster 
(from the peak downwards) are given, after trans- 
formation, by Ti, TiYiz, . . . respectively (reading up- 
wards). Note that, since the transformation is 
decreasing, a cluster />efl/: corresponds to the lower 
endpoint of a vertical string of points in the limiting 
point process. 

In what follows it will be convenient to suppose 
that the point process associated with each cluster 
contains infinitely many points Yij arranged in 
increasing order of size 

l=y;i<>',:<... 

but that infinite values of the Yij are allowed after 
the first point, so that Kj, the number of points in a 
cluster, is just the index of the last finite Yij. By this 
means stochastic properties of Ki are subsumed no- 
tationally in those of {Yjj}. 

We are interested in particular in clusters of 
exceedances by {Xj} of a high threshold u. Let 
u=«„~'(u). Then Xj>u is equivalent to 
u,r'(Xj)<v, and so, in the limit, clusters of ex- 
ceedances of « correspond exactly to those clusters 
in the point process for which 7",- < v. Given that we 
are dealing with such a cluster (as we assume from 
now on) it follows from the unit Poisson nature of 
{(5,,Tj)} that Ti is uniformly distributed over (0, v). 

For many {Xj} the transformation u„ is related in 
a simple way to the marginal distribution function, 
/■'say, of A;. Suppose in fact that {A',}, still satisfying 
condition A, has a positive extremal index 0. Then 
([5], Theorem 3.7,2) 



lim P(M„<u„{T))=e- 



(2) 



lim F{M„SUn(7))=\\m f"*(»„(T)). 



(3) 



Let H„~' denote the inverse function of u„. Consider 
now the two-dimensional point process with points 
(jfn,u,r\Xi)). In Ref. [4], which generalizes Ref 
[3], it is proved under a weak long-range mixing 
condition A that if this point process converges as 
/I ^00 then its hmit has points of the form (5,, 
TiYij), i" >1, l^j<Ki, where (5,,7i), i>l are the 
pointsof a unit Poisson process in 3t^^ and for each 
i,^i,:j = \, . . . , K,} with Kfi = 1, is a point process 



Hence, if the tail function 1 -F of f is denoted by 

^, it follows from Eq. (2) that 

«eSf(«.(T)) --«5logF(«„(T)) -T, 
for large n . We may therefore define «„ by 



U„(7) = ^-'(T/«fl). 



(4) 
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In particular therefore 

and SO the excesses within a cluster, in decreasing 
order of size, are in the limit (dropping the cluster 
index i, no longer relevant) 

= ^-\T%9^iv))-ti, 
i = 1, 2, . . . , JV (5) 

where T' = T/v is uniformly distributed over (0, 1). 
The aggregate excess for the cluster is 

N 

1 

where M, the number of exceedances in the cluster, 
is 

N = m2ix{j:T'Yj <!}, 
and T' is independent of the Yj process. 

3. Asymptotic Distributions of Aggregate 
Excess 

In this section we outline various asymptotic dis- 
tributional properties of aggregate excesses which 
follow from the preceding discussion. The asymp- 
totic distribution of aggregate excess S itself turns 
out to depend on the V process partly through ran- 
dom sums 

where the Zy are defined in terms of {V;} by 
(^)^-l for,>0 
Z>j= 'og(^) forf-0 

I-(^)* for^<0. (6) 



3.1 Limit Distributions of S 

Suppose that the stationary sequence {Xj} satisfies 
Hsing's mixing condition A and has positive extremal 
index, and that the marginal distribution F of the Xj is 
such that the limiting distribution of peak excesses 
within a cluster is generalized Pareto with shape 
parameter $ Suppose too that the corresponding point 
process {{jln,Un ' (Xj))} converges to a limiting process 
with the structure described in Sec. 2. Then, as the 
threshold level u tends to the upper end point, x+ say, 
of the support of X, 



lim p(-~~ >&\ = 
.,-«+ \y^(u) / 



\m)'\ 



for i>0 



4M-'-^^)]f-^-o 



E 



[mn 



fori<0 



(7) 



where 



u} = \ l{\m{u 
I x+ -u 



%(")= 



)) 



for f > 
for i = 
for f <0 



(8) 



for a suitable slowly varying function I, and 



/, =m'm{j: Yji.,= ay or R,^s }. 



(9) 



Expectations in Eq. (7) are taken with respect to the 
point process {Yj}. 

This result is a consolidation and re-statement of 
the main limit forms found in Ref. [1]. The proof— 
essentially a weak convergence argument based on 
the Mori-Hsing process— exploits regular and slow 
variation properties of ^-implied by the assumption 
that cluster peak excesses are, in the limit, general- 
ized Pareto distributed. For example, when f =0, 
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^belongs to the domain of attraction of the Gum- 
bel extreme value distribution, so that, as.v->oo^ 



for each w>0, for some slowly-varying function / 
(see, for example, Ref. [6], Sec. 8,13). Thus 



(-iog(rT;))/(w(«)), 

as «^x+, which establishes the connection between 
the limiting behaviour of S and the V-process, 

We note that Eq. (7) reveals in reasonably ex- 
plicit form the dependence of the distribution of S 
on the number and pattern of excesses within a 
cluster. 

3,2 Joint Limit Distributions 

The techniques used to obtain these results may 
be extended to give limiting distributions for other 
quantities. As an example (motivated by a question 
from a reservoir engineer about peak water level 
and total overtopping discharge at a dam wall) the 
joint distribution of peak and aggregate excesses is 
as follows. 

Under the same assumptions as in Sec. 3.1, and 
with the same notation: 



lim P\ 
I— j^. 



>s. 



ydu) • yi(u) 



>z 



gfmin {exp(-' ^"^ '. °^ ' ), e'']] for ^^Q 



(10) 

Similar joint limiting distributions may also be 
found for fi and S - fi. Like Eq. (10) they are singu- 
lar. Methods of statistical analysis based on them 
have yet to be explored. 



3.3 More Explicit Forms for P{Slyi(u)>s) 

When specific models are assumed for the X pro- 
cess the limiting distributions Eq. (7) take on more 
explicit forms. Several examples were studied in 
Ref. [1], Writing 



limPl 



(4o^^)={ 



(I-fsign(^)F(j,^))-"«for^;^0 



exp(-K(s,0) 



for^-0 
(11) 



it was found that V{s,^) had the same general form 
in all cases considered: that of a concave increasing 
function of 5 dominated by s when f >0, and by min 
{1,5} when f <0. See Fig. 1. 
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f'ig. 1. Forms ftf ihe K(j, f ) funaion. 

The findings and examples above motivate an at- 
tempt to fit aggregate excess data by a distribution 
with tail function of this general form. Two such 
attempts are described in Sec. 4. 

3.4 Higher Thresholds 

As often in extreme value Statistics, an aim in 
many applications will be extrapolation to longer 
time periods or higher levels than .seen in data. In 
particular, for aggregate excesses, extrapolations to 
\(\^z\ thresholds will often be of interest. For exam- 
ple, in Hood applications knowledge of the aggre- 
gate excess above a higher threshold might be vital 
in estimating the reduction in the size of floods that 
would result from improved river or sea defences. 
The following presents a simple relationship on 
which extrapolation of aggregate excesses could be 
based. 
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Suppose that Su and Su- denote aggregate excesses 
above levels u and « ', respectively, with u <u ', in a 
cluster in which level u is exceeded (so that 5^, but 
not necessarily Su'^ is greater than zero). In a slightly 
more refined notation than used earlier, the limit- 
ing forms in Sec. 3.1 are limits, 3f{j) say, of P(5u/ 
7f (u ) >5 15„ > 0) as u -*x^. We are now interested in 
P{Su/yf(u)>s\S.>0). But 



the general form of distribution predicted by 
asymptotic arguments, and in particular a simple 
Weibull distribution with 



* s|5„>o] = 



P(Su>0\Su>0)^ 



^(^^)''«'>"')- 



(12) 



for high u , where fi is the peak excess in the cluster. 
Thus the distribution of aggregate excesses with 
respect to the higher threshold u ' has a point prob- 
ability at corresponding to the event /'(fi<u') 
that no exceedance of u ' occurred, together with a 
form over the strictly positive half-line which is the 
the same as that of the original distribution of ag- 
gregate excesses except for an increased scale 
parameter. Estimation of this distribution may 
therefore be based, through Eq. (12) on estimation 
of M from data on aggregate excesses of u , and of 
P(f, >u') from data on peak excesses of « fitted to 
the Generalized Pareto distribution Eq. (1). Rela- 
tionship Eq. (12) should also be useful as a means 
of checking the fit of specific models for Sf, though 
this aspect has yet to be investigated. 

4. Applications 

4.1 Floods on the River Thames 

In Ref. [1] an application of some of the limiting 
results above to data on levels of the River Thames 
is described. The aim was largely exploratory: to see 
whether there is support in an important data set 
for a model of the general kind suggested in Sees. 
3.1 and 3,3, and, if there is, to seek an appropriate 
parametric form for the model. The results were 
surprisingly positive: confirmation was found for 



P{S>s) = eyi^{-as)% 



(13) 



for some parameters a >0 and </> was found to give 
an acceptable fit to data on S. 

4.2 Ozone Concentr;itions 

An analysis of a further set of data, which calls 
for the extension of the models above to take ac- 
count of covariale information, is now reported. 

The data consist of hourly mean ozone concen- 
trations at a suburban site in Stevenage, about 25 
miles north of London, over the years 1978-1989. 
?Iigh levels of ozone are known to cause direct dam- 
age to vegetation (see, for example, Ref. [7]). One 
tentative suggestion is that a plant or tree suffers 
damage in proportion to cumulative exposure to 
ozone at concentrations above some threshold. The 
threshold is not known, and indeed is likely to be 
different for different plants, but a figure in the 
range 40 ppb-90 ppb might be plausible. Though 
this theory is at present no more than a working hy- 
pothesis, it prompts an interest in the occurrence of 
high values of aggregate excesses of ozone concen- 
trations above moderately high thresholds. The 
analysis summarized below is a preliminary investi- 
gation into the possibility of using the aggregate ex- 
cess models of Sec. 3 to describe such high doses. A 
more complete account of the biological back- 
ground, and of the application of the method to 
spatial variation of exposure over the UK, is given 
in Ref. [8]. 

For the theory of Sees. 2 and 3 to be applicable it 
is desirable that we work with independent clusters 
of high values. The hourly data were therefore sub- 
jected to a preliminary declustering procedure, 
which selected episodes when concentrations above 
a specified 'declustering threshold' were experi- 
enced, and ensured that such episodes were sepa- 
rated far enough in lime to give some plausibility to 
the independence assumption. Figure 2 shows a 
time plot of the resulting aggregate excesses above 
a threshold of 60 ppb, obtained with a declustering 
threshold of 50 ppb and with a time separation be- 
tween clusters of at least 48 hours— these values be- 
ing chosen as typical of those of possible scientific 
interest. An immediate observation from the plot is 
that the assumption of stationarity is suspect: the 
middle years 1982-1986 contain some values higher 
than seen earlier or later. (There are known diurnal 
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patterns in ozone concentrations too, but they are 
of too short a duration to affect the present analy- 
sis.) In view of the apparent nonstationarity a sim- 
ple model of the kind found useful in the earlier 
analysis would not on its own be expected to be par- 
ticularly successful here: and indeed the Weibull 
model Eq. (13) fitted to aggregate excesses above 
60 ppb appears to underestimate the sizes of the 
highest aggregates. 



Cluster Peaks 



Aggregate Excesses 
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Fig. 2. Hourly mean ozone concentrations over 60 ppb: Steve- 
nage 1978-1989. 

The processes leading to the formation of ozone 
in the atmosphere are photo-chemical — driven by 
strong sunlight. It is possible therefore that unusual 
weather conditions in the early to mid 1980s may 
have had some bearing on the possible inhomo- 
geneity. Unfortunately sunlight was not recorded at 
the Stevenage monitoring site, nor was tempera- 
ture, which is a crude surrogate for it. Temperature 
data were not readily obtainable either from nearby 
meteorological stations, but were to hand for 
Sheffield, 140 miles north. Figure 3, showing aggre- 
gate excesses over 60 ppb against monthly averages 
of daily maximum Sheffield temperatures, illus- 
trates that in spite of the geographical separation 
there is nevertheless some connection. It appears 
that the summers over the relevant years contained 
some quite warm spells, presumably experienced in 
Sheffield as well as Stevenage. Accordingly Weibull 
models which incorporate temperature f as a covari- 
ate were fitted. Two forms were used: 



PiS>s) = exp-(s/d(t))*, 



(14) 



in which the scale parameter 8 depends on / in the 
form 5(r) = Se""; and secondly a model suggested by 



the evidence from Fig. 3 that not all occurrences of 
high temperatures t at the time of an ozone cluster 
are necessarily associated with a high aggregate 
ozone dose. This suggests a model in which ozone 
clusters are assumed to be of two types, the first 
showing temperature dependence of the kind 
above, and the second showing no dependence on 
temperature. Thus 



P(S>5 



'-{ 



cxp — (s/5(t))* for type 1 clusters 
exp - (s/d')* for type 2 clusters 



(15) 




5 10 15 20 25 

Mon*.hty average of daily maximym temperature, degrees Centigrade 

Hg. 3. Aggregate excess ozone over 60 ppb vs temperature. 



(Since sunlight/high temperature is at best only one 
of the preconditions known to be necessary for the 
formation of ozone, there is some general scientific 
justification for a model of this form.) In fitting, 
clusters with aggregate excesses above a specified 
level were taken to be of type 1. A likelihood ratio 
te.st shows that model Eq. (15) represents a very 
worthwhile improvement over Eq. (14) even after 
allowing for the inclusion of two extra parameters 
{W = 21.26, p < 10"\ cut-off level for type 1 =500). 
Q-Q type plots for the two covariate models are 
shown in Figs. 4 and 5 respectively. (These are con- 
structed as follows: under model Eq. (14) 5/(5e*') 
reduces to a standard Weibull variable with unit 
scale parameter and shape parameter <^:P{SI 
(<Se*')>5) = exp(-j*). Thus a plot of the ordered 
values of S/(lit^) from a sample of size n against 
[-log(i7/! +1)]"* should yield an approximate line 
of unit slope. Figure 4 is a plot of this kind, and 
Fig. 5 is con.structed similarly from model Eq. (15).) 
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Fig. 4. Q-Q plot fgr aggregate excess ozone: simple covariale 
model Eq. (14). 
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Fig. 5. O-Q plot for aggregate excess ozone: two-type covariate 
model Eq. (15). 



useful in estimating return levels of future high 
doses of ozone above 60 ppb or, following the re- 
sults of Sec. 3.4, above higher thresholds. 
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Both plots appear to show a quite good fit to the 
Weibull model after allowing for dependence on 
temperature, model Eq. (15) doing a little better 
than model Eq. (14). Further refinements of the 
models allowing temperature-dependence also of 
the shap)e parameter ^ gave no worthwhile im- 
provement in fit as judged by a likelihood test. 
Though this is only a preliminary analysis (which 
we hope to complete with better temperature data), 
the results so far are encouraging. They appear to 
show again that models of the form suggested in 
Sec. 3.3, and in particular a Weibull model — after 
allowance in this case for nonstationarity— can rep- 
resent aggregate excess data reasonably well. If this 
is confirmed, then for example these models will be 
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